Clinical statistics for non-statisticians: Day one
Steve Simon
Introduction
Tell us one interesting number about yourself
Examples
I have traveled to eight countries outside the United States
(Canada, Italy, China, France, Russia, England, Holland, and Iceland)
I did not learn how to drive until I was 29 years old
My highest chess rating was 1802, but I am not that good any more.
Your turn
A bit more about myself
PhD in Statistics in 1982 from the University of Iowa
Currently full professor
Part-time statistical consultant
Funded on 18 research grants
Over 100 peer-reviewed publications
Website with over 2,000 pages
Many invitations to talk at conferences
Outline of the three day course
Day one: Numerical summaries and data visualization
Day two: Hypothesis testing and sampling
Day three: Statistical tests to compare treatment to a control and regression models
My goal: help you to become a better consumer of statistics
Day one topics
Numerical summaries
When should you present the mean versus the median
When should you present the range versus standard deviation
How should you display percentages
Why should you round liberally
Day one topics (continued)
Data visualization
How should you display continuous data
Why is the normal bell-shaped curve important
How should you display categorical data
How do you illustrate trends and patterns
What are some common mistakes in the choice of colors
Counting and proportions
Counts are the most common statistic
Counts are error prone
Counts require a solid operational definition
Student exercise
Count the number of occurrences of the letter “e”.
A quality control program is easiest
to implement from the top down.
Make sure that you understand the
the commitment of time and money
that is involved. Every workplace is
different, but think about allocating
10% of your time and 10% of the
time of all your employees to
quality control.
Counting sperm
Figure 1: Image of a haemocytometer
Tables of counts, using the Titanic data.
Figure 2: Counts of survival by gender
Percentages dividing by column totals
Figure 3: Column percentages
Percentages dividing by row totals
Row percentages
Percentages divided by grand total
Cell percentages
My recommendations
Treatment or exposure as rows
Outcome as columns
Usually report row percentages
Female mortality rate: 33%
Male mortality rate: 83%
But sometimes column percentages
Survivors: 68% female, 32% male
Some rationale for these choices
My way
Survived
No Yes
Sex Female 33% (154) 67% (308)
Male 83% (863) 17% (142)
Not my way
Sex
Female Male
Survived No 33% (154) 83% (863)
Yes 67% (308) 17% (142)
On your own
Calculate row and column percentages for the following tables. Interpret your results.
Figure 4: Titanic passenger class counts
Figure 5: Titanic child counts
The mean (average)
Figure 6: Cartoon image of Professor Mean
The median
Figure 7: Road with a median strip
Calculation of the mean and median
Mean
Add up all the values, divide by the sample size
Median
Sort the data
Select the middle value if n is odd
go halfway between the two middle values if n is even
When outliers/skewness might distort your conclusions
Often, either is fine
Criticisms of the mean and median
Are you combining apples and onions?
Are you ignoring minorities?
Use of the mean for ordinal data
Gould 1985
Figure 8: Gould 1985
Bridge 2001, PMID: 11405531
Figure 9: Bridge and McKenzie 2001
Bridge 2001, PMID: 11405531 (continued)
The measurement of airway resistance by the interrupter technique (Rint) needs standardization. Should measurements be made be during the expiratory or inspiratory phase of tidal breathing? In reported studies, the measurement of Rint has been calculated as the median or mean of a small number of values, is there an important difference?
Bridge 2001, PMID: 11405531 (continued)
In the present data the mean of a set of values contributing to a measurement was not significantly different from the median. However, the use of the median has been recommended since it is less affected by possible outlying values such as might be included by fully automated equipment.
Chen 2019, PMID: 31806195
Figure 10: Chen et al 2019
Chen 2019, PMID: 31806195 (continued)
Background: The prices of newly approved cancer drugs have risen over the past decades. A key policy question is whether the clinical gains offered by these drugs in treating specific cancer indications justify the price increases.
Chen 2019, PMID: 31806195 (continued)
Results: We found that between 1995 and 2012, price increases outstripped median survival gains, a finding consistent with previous literature. Nevertheless, price per mean life-year gained increased at a considerably slower rate, suggesting that new drugs have been more effective in achieving longer-term survival. Between 2013 and 2017, price increases reflected equally large gains in median and mean survival, resulting in a flat profile for benefit-adjusted launch prices in recent years.